A Novel Robust Speech Feature Based on the Mellin Transform and Speaker Normalizatuin
نویسندگان
چکیده
A novel robust feature of speech signal has been proposed by us in [1]. The new feature is the modified Mellin transform of the log-spectra of speech signal and is short for MMTLS. Due to the scale invariance property of the modified Mellin transform, the MMTLS is insensitive to the vocal tract length of different speakers. Thus it is more appropriate for speakerindependent speech recognition than the widely used MFCC. In this paper, an improved MMTLS has been proposed. The experiments show that, the improved MMTLS outperforms the original MMTLS in the performance of speech recognition. For the comparison, the frequency warping (FWP) approach based speaker normalization is also investigated. Experiments show that the performance of the improved MMTLS-based speaker-independent recognizer is much better than that of the MFCC-based one even after the latter system is combined with a technique of speaker normalization.
منابع مشابه
A novel robust feature of speech signal based on the Mellin transform for speaker-independent speech recognition
This paper presents a novel kind of speech feature which is the modified Mellin transform of the log-spectrum of the speech signal (short for MMTLS). Because of the scale invariance property of the modified Mellin transform, the new feature is insensitive to the variation of the vocal tract length among individual speakers, and thus it is more appropriate for speaker-independent speech recognit...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملA Wavelet Based Approach for Speaker Identification from Degraded Speech
This paper presents a robust speaker identification method from degraded speech signals. This method is based on the Mel-frequency cepstral coefficients (MFCCs) for feature extraction from the degraded speech signals and the wavelet transform of these signals. It is known that the MFCCs based speaker identification method is not robust enough in the presence of noise and telephone degradations....
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملMel- and Mellin-cepstral Feature Extraction Algorithms for Face Recognition
In this article, an image feature extraction method based on two-dimensional (2D) Mellin cepstrum is introduced. The concept of one-dimensional (1D) mel-cepstrum that is widely used in speech recognition is extended to two-dimensions using both the ordinary 2D Fourier transform and the Mellin transform. The resultant feature matrices are applied to two different classifiers such as common matri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005